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Abstract. AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by 
Chinese Virtual Observatory (China-VO) under funding support from NDRC (National 
Development and Reform commission) and CAS (Chinese Academy of Sciences)! (Cui 
et al. 2014). To archive the astronomical data in China, we present the implementation 
of the astronomical data archiving system (ADAS). Data archiving and quality control 
are the infrastructure for the AstroCloud. Throughout the data of the entire life cy- 
cle, data archiving system standardized data, transferring data, logging observational 
data, archiving ambient data, And storing these data and metadata in database. Quality 
control covers the whole process and all aspects of data archiving. 


1. Introduction 


There are tens of telescopes running in China. Every night and day, they are producing 
several terabytes data. To archive these huge data and manage them, we present an 
implementation of an Astronomical Data Archiving System (ADAS). The data types 
which would be archived are the observation data and ambient data. The observation 
data such as image FITS, spectra FITS and observation log, are produced by telescope 
and data reduce pipeline. Ambient data are some environment data, such as weather, 
seeing data and allsky camera images. 

Archived data is stored into the observatory’s data center first, then data is trans- 
ferred to AstroCloud data center via ADAS. In AstroCloud, we build a Data Access 
API For users and programs to access data. The following telescopes have been al- 
ready using this archiving system to archive their data. These telescope are located 
in multiple sites in China: Guo Shoujing Telescope (LAMOST), Lijiang GMG 2.4m 
Telescope, Xinglong 2.16m Telescope, Delingha 50Bin Telescope, Huairou Solar Ra- 
dio Telescope, Huairou Solar Multi-Channel Telescope and Fuxian 1m New Vacuum 
Solar Telescope (NVST). 
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Figure 1. Data archive Framework 


2. Data Model 


The type of AlJraw data include files and tables. FITS file mainly contain the raw data. 
FITS can be image, can be spectral, etc. The tables are catalog tables, ambient data 
tables, observational logs, etc. Metadata consists of two types: 


e Schema Metadata: Schema Metadata stores all the databases, schemas, tables 
and columns information. The database-schema is similar to the TVOA TAP 
schemas(IVO 2010). 


e Archive Metadata: Archive Metadata stores the FITS files’ header information. 


The database-schema is shown in Table 1. Usually one telescope has one table in 
the archive database. 


Table 1. | Archive Metadata database-schema 


Column Name Definition Description 

id SERIAL Auto increasing integer, Primary Key 
filename VARCHAR (30) FITS file name 

object VARCHAR (30) Observation object 

RA NUMERIC(12, 8) Right ascension, default J2000 

Dec NUMERIC(12, 8) Declination, default J2000 

filesize INTEGER File size (bytes) 

checksum VARCHAR (64) MD5 checksum 

recTime TIMESTAMP WITHOUT TIME ZONE Recorded time 


3. Software archiving Architecture 


The system consists of four submodules (Laher et al. 2014): 
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Figure 2. Software Architecture 
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Data Transfer System (DTS). Data transferring is via network. The network 
transfer is scheduled. In the central data center in NAOC, we set up a Trans- 
fer Server to accept data transfer. We choose rsync tools running this service. 
Because it is open source and has a very good performances. (Zampieri et al. 
2009) 


2 Data Ingest System (DIS). DIS provides the data to database function. This 
procedure will parse the FITS header and choose the necessary filed to record 
into the database. We use the AstroPy(Astropy Collaboration et al. 2013) to 
manipulate the FITS file, which can collect the FITS file header easily.(Dobrzycki 
et al. 2012) 


3 Logging System (LGS). All the operation will be logged into the database. LGS 
is the procedure to log the operation: data transfer, data ingest, database replica- 
tion, etc. 


4 Archive Backup System (BKS). BKS consists of files backup, database replica- 
tion, and database backup. These operations are scheduled. 


Archiving Pipeline 
1 Data will be transferred to the data center in NAOC by DTS in schedule. 


2 After the data is finished transferred. DIS will start running, DIS will check 
the files’ checksum, collect the FITS files’ header and insert it into the archive 
database. 


3 All the files has been checked and record into the database, gather these infor- 
mation (file amount, transfer log, database log, etc) to email these information to 
the system administrator and telescope operator. 


486 He et al. 


4 These FITS files and database will be backup by BKS in schedule. 


5 Database replication: archive database is the write-only database, the SkyTools(Sky 
2014) replication procedure will replicate the database to the Query Databases for 
other user or system to access, such as Data Publish System? (Fan et al. 2014). 


5. Quality Control 


Data quality can be controlled by the data archiving process. In DTS, every file has been 
made a MD5 checksum, before transferred and after transferred, transfer procedure will 
valid the checksum. Database is been checked and valid by schedule. 


6. Conclusions 


We developed and implemented an astronomical data archiving system that can be op- 
erated automatic. When the data is produced, the procedure will be running quietly. 
When the procedure is finished, the operator will receive the job detail email. 
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